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Abstract 

Background: In eukaryotic organisms, packaging of DNA into nucleosomes controls gene expression by regulating 
access of the promoter to transcription factors. The human malaria parasite Plasmodium falciparum encodes 
relatively few transcription factors, while extensive nucleosome remodeling occurs during its replicative cycle in red 
blood cells. These observations point towards an important role of the nucleosome landscape in regulating gene 
expression. However, the relation between nucleosome positioning and transcriptional activity has thus far not 
been explored in detail in the parasite. 

Results: Here, we analyzed nucleosome positioning in the asexual and sexual stages of the parasite's erythrocytic 
cycle using chromatin immunoprecipitation of MNase-digested chromatin, followed by next-generation sequencing. 
We observed a relatively open chromatin structure at the trophozoite and gametocyte stages, consistent with high 
levels of transcriptional activity in these stages. Nucleosome occupancy of genes and promoter regions were 
subsequently compared to steady-state mRNA expression levels. Transcript abundance showed a strong inverse 
correlation with nucleosome occupancy levels in promoter regions. In addition, AT-repeat sequences were strongly 
unfavorable for nucleosome binding in P. falciparum, and were overrepresented in promoters of highly expressed 
genes. 

Conclusions: The connection between chromatin structure and gene expression in P. falciparum shares similarities 
with other eukaryotes. However, the remarkable nucleosome dynamics during the erythrocytic stages and the 
absence of a large variety of transcription factors may indicate that nucleosome binding and remodeling are critical 
regulators of transcript levels. IVloreover, the strong dependency between chromatin structure and DNA sequence 
suggests that the P. falciparum genome may have been shaped by nucleosome binding preferences. Nucleosome 
remodeling mechanisms in this deadly parasite could thus provide potent novel anti-malarial targets. 

Keywords: Malaria, Cell cycle, Nucleosome, Transcription, Sequence 



Background H2A, H2B, H3, and H4. The positioning of these nucleo- 
The genomes of eukaryotic organisms are wrapped somes along the genome is important for control of gene 
around histone proteins to form a condensed protein- expression by regulating access of DNA to transcription 
DNA complex known as chromatin [1]. The basic pack- factors and other DNA binding proteins [2-5]. Nucleo- 
aging unit, the nucleosome, consists of approximately 147 somes are therefore not arranged randomly, but show a 
base pairs of DNA wrapped around a histone octamer, distinct distribution around genes. This nucleosome land- 
containing two copies of each of the histone proteins scape was first observed in Saccharomyces cerevisiae, and 

was later confirmed for higher eukaryotes, such as worms, 
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positioned +1 nucleosome that overlaps the transcription 
start site. The coding region is pacl<aged by an array of 
nucleosomes with increasingly fuzzy positioning towards 
the 3 ' end. Finally, another nucleosome-depleted region is 
present at the 3 ' end of the gene where RNA polymerase 
II terminates transcription. 

Nucleosome positioning is influenced by binding of 
other proteins and protein complexes to the DNA that 
act as barriers, as well as by chromatin remodeling en- 
zymes that actively shape the nucleosome landscape. 
However, one of the strongest determinants of nucleo- 
some positioning is the DNA sequence itself. In euka- 
ryotes such as S. cerevisiae and humans, relatively rigid 
stretches of deoxyadenosines (poly[dA:dT]) decrease 
nucleosome binding affinity [8,9]. Poly(dA:dT) tracts are 
often found in the nucleosome-depleted regions that 
demarcate eukaryotic core promoters [2,3,5] and their 
length and location relative to the transcription start site 
(TSS) have been shown to influence promoter activity 
[10]. In addition, nucleosome-bound DNA shows a 
10 bp periodicity of anti-phased A/T and G/C dinucleo- 
tides that corresponds to the helical turn of DNA 
wrapped around the histone core [11-13]. 

The human malaria parasite, Plasmodium falciparum, 
yearly responsible for an estimated 660,000 deaths [14], 
has the most highly AT-rich genome sequenced to date. 
The AT-content averages 80.7% genome-wide, but reaches 
90-95% in intergenic regions [15]. SimOar to other 
eukaryotic genomes, the P. falciparum genome is bound 
by nucleosomes, but the formation of tightly packed 
heterochromatin may be prevented by the absence of 
linker histone HI, of which to date no homologue has 
been detected in P. falciparum. The P. falciparum genome 
is relatively depleted of transcription factors, but encodes 
a large amount of chromatin remodeling enzymes [16]. 
Accordingly, the nucleosome landscape of P. falciparum 
has been proposed to be important for regulation of gene 
expression during the stage of its life cycle responsible for 
disease in humans [17-19]. This asexual erythrocytic stage 
is characterized by parasite invasion of a red blood cell, 
followed by multiplication and the release of 16-32 
daughter cells after a cell cycle of approximately 48 hours. 
Differentiation of asexual parasites into male or female 
gametocytes allows transmission of P. falciparum to 
mosquitos. Understanding the regulatory mechanisms for 
sexual differentiation is critical for the development of 
strategies aimed at preventing transmission of malaria. 

In a previous study, we analyzed the nucleosome land- 
scape of the P. falciparum genome during the asexual 
cycle by a combination of micrococcal nuclease (MNase) 
assisted isolation of nucleosome-bound DNA (MAINE) 
and formaldehyde-assisted isolation of protein-free DNA 
(FAIRE), coupled to next-generation sequencing [18]. 
Due to the relatively low coverage of these data sets 



(generated in 2008), we were unable to perform an in- 
depth study of correlations between transcriptional activ- 
ity and nucleosome positioning, in particular for promoter 
regions. Here, we generated a new, high- coverage, nucleo- 
some positioning data set and uncovered a strong de- 
pendency between sequence composition, nucleosome 
occupancy and transcriptional activity of both promoter 
and gene regions during the asexual and sexual stages of 
P. falciparum. This study confirms that one of the main 
determinants of nucleosome positioning in eukaryotes is 
encoded in the DNA sequence itself and that changes in 
nucleosome occupancy greatly influence gene expression. 
However, given the strong dynamics in nucleosome po- 
sitioning observed during the erythrocytic cycle and the 
relative paucity of transcription factors, these results 
suggest that the parasite may have developed its nucleo- 
some landscape as a key mechanism to regulate gene 
expression. 

Results 

Generation of high-resolution nucleosome positioning 
maps 

A detailed analysis of the relation between the nucleosome 
landscape and gene expression during the human malaria 
parasite's asexual cycle has thus far been hampered by a 
lack of high-resolution nucleosome maps. In addition, 
nucleosome maps were not yet available for the sexually 
mature gametocyte stage. To extend and complement 
existing nucleosome occupancy data sets [17,18,20], we 
performed two variants of chromatin immunoprecipita- 
tion coupled to next generation sequencing (ChlP-Seq) 
for nucleosome mapping using either sonication (sonic- 
ation ChlP-Seq) or MNase digestion (MNase ChlP-Seq) 
to fragment crosslinked chromatin into mononucleo- 
somes (Additional file 1: Figure SI). Antibodies used for 
immunoprecipitations were either directed against H4, 
which is universally present in all nucleosomes, or against 
an epitope shared between H3 and H3.3, which enabled 
pull-down of all nucleosomes with the exception of 
centromeric nucleosomes that contain the variant histone 
CenH3. Nucleosome-bound DNA fragments were subse- 
quently analyzed by next-generation sequencing (lUumina 
HiSeq 2000), generating 50 base pair (bp) paired-end 
sequence reads. Reads were first mapped to the human 
genome, followed by mapping of all remaining reads to 
the P. falciparum genome (Additional file 1: Figure SI, 
Table SI). 

While digestion of chromatin by MNase has been re- 
ported to be biased by the preference of MNase for 
AT-rich sequences [21], others have found no evidence for 
such claim [22]. The use of a non-enzymatic fragmenta- 
tion method and the specific enrichment for nucleosome- 
bound DNA by immunoprecipitation should rule out that 
nucleosome-sized fragments were obtained merely as a 
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potential sequence-based artifact of MNase digestion. 
MNase digestion resulted in a narrow distribution of 
mononucleosomal DNA fragments (average size = 130 bp) 
while sonication yielded larger fragments (average size = 
253 bp; Additional file 1: Figure S2). Sonication thus 
seems to leave relatively long tails of DNA that are not 
part of the nucleosome itself, and may sometimes yield in- 
tact di- or trinucleosomes, indicating that MNase-based 
fragmentation of chromatin provides a higher resolution 
as compared to sonication. 

Extensive visual inspection of MAINE -Seq [18] and 
sonication/MNase ChlP-Seq tracks in a genome browser 
confirmed very similar distribution patterns across the 
genome (Figure lA). Accordingly, we observed high cor- 
relations between MAINE -Seq [18], MNase ChlP-Seq, 
and sonication ChlP-Seq (Additional file 1: Figure S3), 
indicating that we obtained similar nucleosome maps 
despite differences in methodology. We therefore used 
MNase ChlP-Seq data for all subsequent analyses of the 
asexual cycle. However, as ChlP-Seq generally yields 
relatively low amounts of DNA, we analyzed the nucleo- 
some landscape of sexually mature parasites (gameto- 
cytes) using MAINE-Seq. 

Transcriptionally active stages show global depletion in 
nucleosome levels 

Nucleosome dynamics were analyzed in synchronized 
P. falciparum cultures that were harvested directly after 



invasion of red blood cells at the ring stage, and at 
the trophozoite and schizont stages, as well as at the 
sexually mature gametocyte stage. The amount of chro- 
matin used for immunoprecipitations was equivalent for 
all stages of the asexual cell cycle. However, DNA yields 
obtained from ChlPs consistently showed a reduced re- 
covery at the trophozoite stage (Figure IB). These results 
are consistent with data from a wide variety of experi- 
mental approaches, including western blot analysis [23] 
and mass spectrometry analysis [24] (Figure IC), that 
showed reduced histone levels at the trophozoite stage, 
which is known to be the most transcriptionally active 
stage during the asexual cell cycle [25,26]. While detec- 
tion of histone proteins decreased at the trophozoite 
stage as compared to the ring and schizont stages, mass 
spectrometry data showed an increased abundance of 
RNA polymerase II [24] (Figure IC), further validating a 
potential link between histone levels and transcriptional 
activity. 

Next, we determined the number of nucleosomes and 
the average nucleosome level at each stage using the nu- 
cleosome positioning tool PuFFIN [27]. PuFFIN was de- 
signed in our group specifically for paired-end sequence 
reads. Extensive experimental results showed that PuFFIN 
outperforms other available nucleosome detection tools in 
terms of accuracy and number of nucleosomes reported 
along the genome. The number of reads mapped to each 
nucleosome was used to estimate nucleosome levels. To 
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Figure 1 Validation of data sets. A. Genome browser view (Artemis [56]) sinowing genome coverage for a section of chromosome 4, with the 
gene model indicated in the top. Genes located on the forward and the reverse strands are indicated in yellow and orange, respectively. 
Sequencing of genomic DNA resulted in an even distribution of reads across intergenic and coding regions. Sequence read coverage for MAINE-Seq 
[18], Sonication ChlP-Seq and MNase ChlP-Seq was similar and showed increased coverage of coding region as compared to intergenic regions, while 
opposite results were observed for FAIRE-Seq [18]. B. DNA yields of immunoprecipitations using a non-specific antibody, an antibody directed against 
histone H3 (Abeam ab1971) and an antibody directed against histone H4 (Millipore 05-858). These data show a reduced amount of DNA obtained 
from trophozoite samples, despite using the same amount of input material for each immunoprecipitation. Since low amounts of DNA were obtained 
using the anti-H4 antibody, we concluded that this antibody was not well suited for ChlP-Seq experiments and was therefore not used for further 
analysis. C. Total spectral counts for all histone proteins (left) and all annotated RNA polymerase II proteins (right) obtained by mass spectrometry 
analysis of nuclear fractions from ring, trophozoite and schizont stage parasites [24]. Abundance levels of histones decreased during the trophozoite 
stage, while the amount of RNA polymerase II increased, consistent with high transcriptional activity in this stage, but not the ring or schizont stages. 
Spectral count data were obtained from PlasmoDB (www.plasmodb.org). 
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compare nucleosome levels between different stages, we 
normalized nucleosome scores using the total number of 
reads mapped to both human and P. falciparum genomes 
(see Materials and methods). In addition, we validated our 
nucleosome positioning data using a publicly available nu- 
cleosome positioning tool NOrMAL [28] . 

The number of detected nucleosomes substantially in- 
creased in comparison with previous low-coverage 
MAINE-Seq data [18] (Figure 2A). The number of loca- 
tions where nucleosomes were present did not change dra- 
matically during the cell cycle (Figure 2A) and were not 
greatly affected by differences in sequencing library size 
between the different stages (Additional fOe 1: Figure S4). 
However, nucleosome scores showed approximately a 
two-fold drop at the trophozoite stage, both inside genes 
and in intergenic regions (Figure 2B), similar to reductions 
in Chip DNA yield (Figure IB) and histone abundance at 
this stage (Figure IC). In agreement with a previous data 
set [18], these results suggest a binary structure of the 
chromatin, with strong nucleosome packaging early in the 



cell cycle, nucleosome depletion in trophozoite stage to 
allow massive transcription and DNA replication, and tight 
re-compaction at the late stage in preparation for egress of 
daughter cells and re-invasion of new red blood cells. Fur- 
thermore, nucleosome positioning obtained from sexually 
mature gametocytes also showed a relatively open chroma- 
tin structure, consistent with the level of transcriptional 
activity at this stage (Figure 2). 

Nucleosome occupancy inside genes was higher as com- 
pared to the relatively nucleosome-depleted intergenic re- 
gions (Figure 2B). Reduced sequence read coverage in 
intergenic regions as compared to coding regions was also 
observed for a previously published nucleosome occu- 
pancy data set [20], albeit to a lower extent (Figure 2C). 
The most strongly positioned nucleosomes were observed 
at the start and the end of coding regions (Figure 2D). Bet- 
ween these two strongly positioned nucleosomes, nucleo- 
somal organization in P. falciparum genes appears to be 
relatively random (Figure 2D), in contrast to the evenly 
spaced arrays of nucleosomes observed inside genes of 
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Figure 2 Nucleosome positioning analysis across the erythrocytic cell cycle. A. The total number of nucleosomes in intergenic and coding 
regions as detected in our old MAINE-Seq data set (left panel) [18] compared to the total number of nucleosomes reported by PuFFIN using our 
new high-coverage MNase ChlP-Seq data set (right panel). B. Average scores of nucleosomes located in intergenic and coding regions at different 
time points during the asexual and sexual stages of the erythrocytic cell cycle. Values are expressed as the percentage of the average score of 
nucleosomes located inside genes at the ring stage. C. Average genome-wide sequence read coverage around the translation start site for a 
previously published nucleosome occupancy data set [20]) and the MNase ChlP-Seq data set from this study. Both data sets show a well-positioned 
first nucleosome inside the coding region, as well as reduced coverage in intergenic regions as compared to coding regions. Coverage is expressed as 
the fraction of the highest coverage value among all positions within the plot window. D. Nucleosomal organization in and around coding regions. 
The location and scores of nucleosomes detected by PuFFIN are shown for all 2,262 P. falciparum genes that do not contain any introns. Each line 
represents one gene, sorted from small (top) to large (bottom) and centered around its midpoint More strongly positioned nucleosomes are 
represented by increasingly warmer colors. Troph.: trophozoite; schiz.: schizont; gam.: gametocyte. 
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Other eukaryotes. To validate that the low coverage ob- 
served in intergenic regions was not mainly due to an 
artifact introduced by the low recovery of sequencing 
AT-rich regions, we sequenced parasite genomic DNA, 
which resulted in an even coverage distribution over inter- 
genic and coding regions (Figure lA). In addition, sequen- 
cing results from non-protein-bound DNA (FAIRE-Seq) 
[18] showed a high enrichment of intergenic regions, with 
more than 50% of all sequence reads mapped to intergenic 
regions (Additional file 1: Figure S5). These data suggest 
that the reduced intergenic coverage in our nucleosome- 
bound DNA sequencing data set is not simply the result 
of technical difficulties in sequencing the highly AT-rich 
intergenic regions of the P. falciparum genome. Similar 
nucleosome maps were obtained using sonication ChlP- 
Seq, although the number of mapped nucleosomes 
was lower due to the lower resolution of this data set 
(Additional file 1: Figure S6). Collectively, these results 
confirm that dramatic changes in nucleosome landscape 
occur during the transcriptionally most active erythrocytic 
stages of P. falciparum. 

Nucleosome occupancy in promoter regions strongly 
correlates with transcription levels 

In other eukaryotic organisms, decreased nucleosome 
occupancy in promoter regions is associated with higher 
transcriptional activity [4]. For P. falciparum, this con- 
nection has not been studied in detail. Our most recent 
high coverage nucleosome and transcriptome [29] data 



sets, both obtained at the exact same parasitic stages, 
allowed us to analyze the impact of nucleosome occu- 
pancy in promoter regions on gene expression in P. falci- 
parum. We computed nucleosome occupancy levels for 
the 500 base pair region directly upstream of the transla- 
tion start codon at the trophozoite stage. Genes were 
grouped into ten transcription clusters of 500 genes each, 
based on steady-state mRNA levels. We observed a strong 
correlation between nucleosome density in the promoter 
region (both in terms of number of nucleosomes and nu- 
cleosome level) and transcriptional activity, with highly 
expressed gene clusters showing a more open nucleosome 
organization than clusters of genes with low expression 
values (Figure 3A, B). In addition, for a smaller group of 
genes with annotated transcription start sites, promoter 
regions of highly expressed genes showed decreased nu- 
cleosome occupancy (Figure 3C). Inside coding regions of 
highly expressed genes, we observed an opposite correl- 
ation between nucleosome occupancy and transcriptional 
activity as compared to the promoter regions. Highly tran- 
scribed genes were on average bound by more nucleo- 
somes and at higher levels (Figure 3D, E). A similar, albeit 
slightly weaker, association between nucleosome occu- 
pancy and transcript levels was observed when nucleosome 
maps were generated using the nucleosome positioning 
tool NOrMAL [28] (Additional file 1: Figure S7), indicating 
that this result is not an artifact of the PuFFIN algorithm. 
In addition, the more open chromatin structure at 
promoters of highly expressed genes was replicated in 
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Figure 3 Association between nucleosome occupancy and transcription level at the trophozoite stage. A. The average number of 
nucleosomes per kilobase in the 500 base pair upstream of the translation start site (promoter region) for each transcription cluster of 500 genes. 
B. The average score of nucleosomes located in the promoter region for each transcription cluster. C. Sequence read coverage around the 
transcription start site for the subset of genes with annotated transcription start sites in each transcription cluster. D. The average number of 
nucleosomes per kilobase inside coding regions for each transcription cluster. E. The average score of all nucleosomes located inside coding 
regions for each transcription cluster. Correlation coefficients reported were obtained using the Spearman's rank test. F-J: Same as panels A-E for 
an independent trophozoite-stage nucleosome occupancy data set generated by Bartfai ef al [20]. 
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sonication ChlP-Seq data obtained using either anti-H3 or 
anti-H4 antibodies (Additional file 1: Figure S8), sugges- 
ting that this observation is independent of ChlP-Seq 
methodology. 

To further validate our findings, we analyzed a previously 
published nucleosome occupancy data set generated using 
different experimental procedures for nucleosome isolation 
and library preparation [20]. We observed a similar cor- 
relation between gene expression level and nucleosome 
occupancy in both the promoter region and the coding re- 
gion (Figure 3F-J), indicating that these findings are not a 
result of biases of the methodologies used. 

Similar results were obtained using other publicly 
available RNA-Seq data sets [20,26,30] (Additional file 1: 
Figure S9). The correlation between nucleosome occu- 
pancy and transcription level was also observed when 
genes were divided into 100 clusters of 50 genes (Additional 
file 1: Figure SIO), but was absent for randomly generated 
gene clusters (Additional file 1: Figure Sll). To further in- 
vestigate the dependency between nucleosome occupancy 
and transcript levels, we analyzed previously published 
transcriptome and nucleosome positioning data sets from 
yeast [31,32] and human cells [33] using our analysis pipe- 
line. In yeast, we observed a comparable, but weaker 
association between expression levels and nucleosome 
occupancy in the promoter region (Additional file 1: 
Figure S12). In human, the nucleosome landscape at the 
promoter region has a bimodal distribution, with a 
relatively open organization for expressed genes and a 
relatively closed organization for silent genes (Additional 
file 1: Figure SI 2). Transcriptional control in human cells 
is known to include more complex regulatory elements, 
such as enhancer regions that are located further away 
from the transcription start site. These results could there- 
fore indicate that nucleosomal organization of promoter 
regions may be more important for regulation of tran- 
scriptional activity in organisms with compact genomes, 
such as P. falciparum, than in higher eukaryotes. 

Nucleosome occupancy patterns associated with 
stage-specific gene expression 

Similar associations between transcriptional activity and 
nucleosome characteristics of the promoter region and 
gene region were observed at the ring and schizont 
stages of the asexual cycle, and at the gametocyte stage 
(Additional file 1: Figure SI 3). This may be the result of 
large overlaps between gene clusters with high transcrip- 
tion levels at different stages, as a large proportion of 
genes has high transcriptional activity throughout the 
erythrocytic cell cycle. A comparison of nucleosome 
occupancy for genes with stage-specific transcription 
profiles showed that changes in nucleosome occupancy 
occur at a genome-wide level (Figure 4). However, pro- 
moter regions of genes transcribed at the ring and 



trophozoite stage on average tend to be the most nucleo- 
some-depleted, while promoter regions of genes expressed 
at the schizont and gametocyte stage have on average a 
slightly more compact nucleosome organization (Figure 4). 
Stage-specific transcription factors may be critical for 
regulating expression of these particular genes. In ad- 
dition, we found that var genes, associated with antigenic 
variation and pathogenesis, are strongly bound by nucleo- 
somes, while their promoters are relatively nucleosome- 
depleted (Figure 4). Transcription of the var gene family is 
known to be tightly regulated and involves mutually ex- 
clusive expression of one var gene out of a total of ap- 
proximately 60 gene variants. Transcriptional repression 
of the remaining variants is achieved by a combination 
of upstream and intronic regulatory elements, nuclear 
localization in repressive centers and repressive histone 
marks [34-37]. The nucleosome landscape of this gene 
family is markedly different from non-mr genes and could 
contribute to these alternative mechanisms of transcrip- 
tional regulation [17]. 

AT-repeats are the strongest inhibitory sequences for 
nucleosome binding in P. falciparum 

In other eukaryotes, DNA sequence has been shown to be 
an important determinant of nucleosome positioning. 
Poly(dA:dT) tracts, in particular, are frequentiy present in 
nucleosome-depleted promoter regions and inhibit nu- 
cleosome binding by decreasing the ability of the DNA 
helix to wrap around the nucleosome core [2,3,5] [8-10]. 

To investigate whether sequence-dependent nucleo- 
some binding preferences are present in the AT-rich 
P. falciparum genome, we computed the average fre- 
quency of dinucleotides in all sequence reads that exactly 
matched the length of DNA wrapped around a nucleo- 
some (i.e. 147 bp). In contrast to the strong 10 bp perio- 
dicity observed in other eukaryotes, we observed a weak 
10 bp periodic signal for AA and TT dinucleotides in 
P. falciparum nucleosomes located inside genes, and no 
periodic signal in intergenic regions (Figure 5A, B). We 
also computed the enrichment of all 5-mers located in 
nucleosomes relative to the whole genome. In coding 
regions, nucleosome-bound DNA showed an enrichment 
of 5-mers exclusively composed of G/C and a depletion of 
5-mers exclusively composed of A/T (Figure 5C), com- 
parable to observations in yeast [38]. In contrast, A/T 
5-mers were not as strongly depleted from nucleosomes 
located in intergenic regions, with the exception of 5-mers 
consisting of AT-repeats (ATATA and TATAT; Figure 5D). 
In addition, 5-mers containing a GG dinucleotide bor- 
dered by cytosines showed depletion from intergenic nu- 
cleosomes (CGGCC, CCGGC, and GCGGC; Figure 5D). 
Similar results were obtained using 4-mer or 6-mer distri- 
bution patterns (Additional file 1: Figure S14). Interes- 
tingly, intergenic nucleosomes showed a small enrichment 
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for 5 bp poly(dA:dT) tracts, which were mainly found close 
to the nucleosome dyad (Additional file 1: Figure S15), 
while this 5-mer is strongly unfavorable for nucleosome 
binding in other eukaryotes. For nucleosomes located in 
genes, the level of 5-mer enrichment was inversely corre- 
lated with their genome-wide frequency (Figure 5E). This 
was not observed for intergenic nucleosomes, with the 
exception of 5-mers consisting of AT-repeats (Figure 5F). 

Similar to poly(dA:dT) tracts in other eukaryotes, 
AT-repeats seem to be the most unfavorable sequences 
for nucleosomes in both gene and intergenic regions. 
Since AT-repeats are also the most abundant 5-mers in 
the P. falciparum genome, comprising 3.4% and 10.7% of 
gene and intergenic genome sequence, respectively, these 
sequences may have a profound effect on the nucleosome 
landscape of this parasite. Indeed, AT-repeat sequences 
were strongly depleted inside coding regions of highly 
expressed genes (Figure 5G, Additional file 1: Figure S16) 
and were enriched in promoters of highly expressed genes 
(Figure 5H, Additional file 1: Figure S16), corresponding 
to levels of nucleosome occupancy in these regions. 



Highly expressed genes showed a strong enrichment of 
5-mers with low genome-wide frequency (Figure 51, 
Additional file 1: Figure SI 6), whOe promoter regions of 
high and low expression genes did not show a difference in 
5-mer composition (Figure 5J, Additional file 1: Figure S16). 

Discussion 

Here, we studied nucleosome positioning in the human 
malaria parasite, and its association with transcription 
levels and DNA sequence. Our results suggest that nu- 
cleosome levels are partially DNA-encoded and constitute 
an important control mechanism of gene expression levels 
in P. falciparum. The association between DNA com- 
position (mainly AT-repeat sequences), nucleosome land- 
scape, and transcription levels observed in this study is 
similar to the effect of poly(dA:dT) tracts on nucleosome 
positioning and promoter activity in yeast [10,38]. How- 
ever, to the best of our knowledge, this is the first report 
that details their relation in P. falciparum and provides a 
genome-wide, in vivo validation of the original obser- 
vations in yeast. In addition, this study exposes several 
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Figure 5 Sequence-based nucleosome-binding preferences in P. falciparum. A-B. Frequency of AA 4- TT and GG + CC dinucleotides in 
nucleosome-bound DNA fragments in coding regions (A) and intergenic regions (B). For nucleosomes located inside genes, a weak 10 bp periodicity 
was observed directly internal to the nucleosome boundaries, but disappearing closer to the nucleosome dyad. C-D. Enrichment of 5-mers consisting 
exclusively of A^ or G/C nucleotides for nucleosomes in coding regions (C) and intergenic regions (D) at the ring stage. Enrichment is expressed as 
the log 2 ratio of the frequency of a 5-mer inside nucleosomes and the genome-wide frequency of that 5-mer. E-F. Genome-wide frequency of each 
5-mer plotted against the enrichment of each 5-mer in nucleosomes, shown for nucleosomes in coding regions (E) and intergenic regions (F). 
Nucleosomes inside coding regions show a strong binding preference for 5-mers with low genome-wide frequencies, while this association is absent 
for intergenic nucleosomes. AT-repeat 5-mers are indicated by a red circle. Pearson correlation coefficients are shown in the top right corner of each 
plot. G-H. Enrichment of AT-repeat 5-mers (ATATA and TATAT) in coding regions (G) and in the 500 nucleotides upstream of the translation start site 
(promoter region) (H) relative to their genome-wide frequency, for genes in each transcription cluster. Correlation coefficients were obtained using the 
Spearman's rank test. I-J. Sequence composition in coding regions (I) and promoter regions (J) among genes with different expression levels. 5-mers 
were divided into five groups based on their genome-wide frequency in intergenic regions. For each transcription cluster, the z-normalized average 
log 2 ratio of 5-mer frequency in the gene or promoter versus all genes or intergenic regions, respectively, is plotted for all 5-mer frequency groups. 
Highly expressed genes show an increased prevalence of nucleosome-favorable 5-mers, while genes with low transcription levels show an increased 
prevalence of nucleosome-disfavorable 5-mers. 



distinct differences in nucleosome organization and dy- 
namics between P. falciparum and other eukaryotes that 
will be discussed in more detail below. 

Nucleosome binding preferences 

High AT-content, in particular in the form of poly(clA:dT) 
tracts, are known to influence nucleosome binding affi- 
nities [2,3,5] [8-10]. However, the nucleosome binding 
preferences that we observed in P. falciparum are distinct 
from eukaryotes with lower genomic AT-contents. The 
typical 10 bp periodicity of A/T dinucleotides was absent, 
similar to observations in the AT-rich (77%) amoebe 
Dictyostelium discoideum [39]. In addition, AT-repeat 
sequences, not poly(dA:dT) tracts, seem to be a strong de- 
terminant of nucleosome positioning in P. falciparum. 
The P. falciparum histone H3 shows 92% homology to 
human H3 and 87% to S. cerevisiae H3, which could 
account for differences in nucleosome binding prefer- 
ences. Furthermore, nucleosomes located in the intergenic 
regions of the P. falciparum genome have been found to 
contain histone variants H2A.Z and H2B.Z [40,41]. The 
presence of these histone variants may promote nucleo- 
some deposition in highly AT-rich regions, and could 



explain differences in binding preferences between genie 
and intergenic nucleosomes. 

Given the high prevalence of AT-repeat sequences in 
the P. falciparum genome, it is tempting to speculate 
that the development of chromatin organization as a 
regulatory mechanism for gene expression levels has 
been the driving force for the evolution of the P. falci- 
parum genome towards its high AT-content. This in 
turn raises the question how nucleosome positioning 
and transcriptional activity are connected in Plasmo- 
dium species with lower AT-content, such as P. vivax 
(57.7% AT [42]). However, since presently P. vivax can- 
not be maintained in a continuous culture, this question 
might be difficult to answer. 

Nucleosome landscape in and around coding regions 

Nucleosomal organization around coding regions is rela- 
tively conserved across divergent eukaryotic species, with 
strongly positioned nucleosomes bordering nucleosome- 
depleted promoter regions [6,7]. Downstream nucleo- 
somes are arranged in tightly spaced arrays with increasing 
fuzziness towards the 3 ' end of coding regions. In P. fal- 
ciparum, we observed the most strongly positioned 
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nucleosomes at the start and the end of coding regions, 
with less tightly organized nucleosomes in the gene body. 
The highly AT-rich intergenic regions may act as barriers 
determining the positioning of the first and last nucleo- 
some inside the coding region. In addition, the formation 
of evenly spaced nucleosomal arrays may also be affected 
by the high AT-content of the P. falciparum genome. In- 
stead of being placed at a fixed distance from the neigh- 
boring nucleosome, nucleosomes may preferentially locate 
to regions with the least inhibitory sequences, resulting in 
seemingly more random spacing. Despite these differences 
in nucleosome organization around coding regions, our re- 
sults suggest that the general relation between nucleosome 
occupancy and promoter strength that has previously been 
uncovered in yeast also applies to P. falciparum. 

Global nucleosome dynamics associated with 
transcriptional activity 

The dramatic genome-wide changes in chromatin orga- 
nization during the trophozoite stage are in stark contrast 
with the modest rearrangements in nucleosome position- 
ing that are generally observed during transcription in 
eukaryotes. Our observations in P. falciparum are con- 
sistent with a model in which genome-wide nucleosome 
eviction at the trophozoite stage drives massive transcrip- 
tional activity. This global change in chromatin structure 
is confirmed by a recent study that determined the three- 
dimensional structure of the parasite genome throughout 
its erythrocytic cycle, demonstrating the opening of chro- 
matin at the trophozoite stage [43]. In the absence of a 
large variety of transcription factors, nucleosome occu- 
pancy levels of individual promoters and gene regions 
may fine-tune the level of transcription for each gene by 
carefully regulating the accessibility of promoter regions 
to more general transcription factors and the RNA poly- 
merase II machinery. Interestingly, we observed that 
sexual genes have more densely nucleosome-packed pro- 
moters that may prevent activation of these genes during 
the asexual cell cycle. Nucleosome repackaging during the 
schizont stage reduces the transcriptionally permissive 
state of the genome, and prepares the parasite for the 
process of egress and re-invasion. During stages with 
higher nucleosome occupancy, stage-specific gene expres- 
sion may be tightly controlled by local chromatin remo- 
deling complexes that provide access of highly selective 
transcription factors [44] to promoter regions. Thus, nu- 
cleosome remodeling and specific transcription factors are 
likely to act in concert to regulate both the levels and the 
timing of gene expression. 

Nucleosome depletion during the trophozoite stage 
could also be related to DNA replication, which also 
occurs during this stage. However, DNA replication is 
generally associated with an increased level of histones 
in eukaryotic cells (reviewed in [45]). The mechanisms 



employed by P. falciparum to organize its genome-wide 
nucleosome eviction remain to be fully described. The 
P. falciparum genome is known to encode a relatively 
large amount of chromatin remodeling enzymes [16], and 
many more may be present among its approximately 
1,600 conserved genes for which the function is currently 
unknown. Considering the potential importance of nu- 
cleosome remodeling for cell cycle progression, parasite- 
specific enzymes that are involved in this process may be 
prime targets for novel antimalarial drugs. 

Histone modifications and variants 

Nucleosome occupancy is likely to act in concert with 
other mechanisms to form a regulatory network that con- 
trols gene expression at multiple levels. In other eukar- 
yotes, both histone variants and histone post-translational 
modifications influence chromatin structure and are im- 
portant contributors to gene regulation. In the malaria 
parasite, this correlation is less conserved. P. falciparum 
histones are predominantly acetylated and monome- 
thylated resulting in a transcriptionally permissive state 
[46], while heterochromatin marked by H3K9me3 and 
H3K36me3 is restricted to regions that contain tightly 
regulated gene families, such as var, rif and stevor [47,48]. 
Similarly, H2A.Z is mainly found in the + 1 nucleosome of 
active promoters in other eukaryotes, while the universal 
localization of H2A.Z to intergenic regions in P. falcip- 
arum [40,41] points towards a differential function of this 
histone variant. The correlation between nucleosome oc- 
cupancy and transcript levels as described in this study is 
at least as strong as what has been observed for any his- 
tone modification or variant. We therefore speculate that 
nucleosome occupancy itself could be a major deter- 
minant of gene expression levels in P. falciparum. How- 
ever, it will be necessary to define the role of all epigenetic 
mechanisms of gene regulation to completely understand 
transcriptional regulation in this parasite. 

Conclusions 

The results of this study suggest that nucleosome posi- 
tioning in P. falciparum is strongly influenced by DNA 
sequence composition and that chromatin organization is 
an important regulator of gene expression levels. A deeper 
understanding of chromatin structure and nucleosome dy- 
namics involved in parasite-specific mechanisms of gene 
regulation could contribute to the discovery novel anti- 
malarial drug targets. 

Methods 

Parasite culture 

P. falciparum strains 3D7 and NF54 were cultured in 
white blood cell-depleted human erythrocytes at 5% 
hematocrit as previously described [49]. For asexual time 
points, 3D7 cultures were synchronized twice at ring 
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Stage with 5% D-sorbitol treatments performed eight 
hours apart [50]. Parasite developmental stages were 
assessed by Giemsa-stained blood smears. Cultures at 8% 
parasitemia were harvested 48 hours after the first sorbitol 
treatment (ring stage), and then 18 hours (trophozoite 
stage) and 36 hours thereafter (schizont stage). For the 
MNase ChlP-Seq experiment, ring stage cultures were 
obtained early in the next erythrocytic cycle, after re- 
invasion of parasites (6 hours after the schizont stage). 
The induction of gametocyte-stage parasites in the NF54 
cell line was adapted from a previously published protocol 
[51]. In brief, parasites were synchronized by 5% sorbitol 
lysis and diluted the following day into 75 cm^ flasks to 
reach 0.5% parasitemia at a hematocrit of 8.3% (total 
volume of 15 ml). Parasites were stressed for 3 days by 
daily replacement of 10 ml of culture media. Cultures with 
a 5-10% parasitemia were then induced by increasing 
media to a final volume of 25 ml per flask. For the next 
5 days, cultures were maintained by removing 10 ml of 
media and adding 10 ml of fresh media supplemented 
with 50 mM N-acetyl glucosamine (NAG) to extinguish 
asexual parasites. Subsequently, cultures were fed with 
regular media until harvest of gametocytes at 2% para- 
sitemia 14 days after induction, corresponding to stage 
IV-V of gametocytogenesis. MAINE-Seq [18] and RNA- 
Seq [29] were performed as described previously. ChlP- 
Seq was performed as described below. 

Parasite extractions 

Parasites cultures were pelleted for 5 min at 660 x g at 
4°C and subsequently lysed by 10 min incubation on ice in 
0.15% saponin in water. Parasites were then centrifuged 
for 10 min at 4,211 x g at 4°C, resuspended in PBS, centri- 
fuged for 10 min at 2,000 x g at 4°C, resuspended in PBS, 
transferred to a microcentrifuge tube and centrifuged for 
5 min at 5,000 rpm at RT in a microcentrifuge. Subse- 
quently, parasites were crosslinked for 5 min (MNase 
protocol) or 10 min (sonication protocol) in 1% fresh for- 
maldehyde in PBS at RT. The crosslinking reaction was 
quenched by adding glycine to a final concentration of 
0.125 M and incubating at RT for half the incubation time 
used for crosslinking. Parasites were centrifuged at for 
5 min at 5,000 rpm at 4°C and washed twice in cold PBS 
before parasite pellets were stored at -80°C. Parasite were 
resuspended in nuclear extraction buffer (10 mM HEPES, 
10 mM KCl, 0.1 mM EDTA, 0.1 mM EGTA, 1 mM DTT, 
0.5 mM 4-(2-aminoethyl)benzenesulfonyl fluoride hydro- 
chloride (AEBSF), and EDTA-free protease inhibitor cock- 
tail (Roche, Basel, Switzerland)) and incubated for 30 min 
on ice. Igepal CA-360 (Sigma- Aldrich, St. Louis, MO) was 
added to a final concentration of 0.25%, followed by 
mechanical lysis of parasites by passing the suspension ten 
times through a 26 G y2 inch needle. Parasite nuclei were 
then centrifuged for 20 min at 5,000 rpm at 4°C. 



Chromatin fragmentation by sonication 

Parasite nuclei were resuspended in SDS lysis buffer (1% 
SDS, 10 mM EDTA, 50 mM Tris-HCl pH 8.1, and 
EDTA-free protease inhibitor cocktail), diluted to a con- 
centration equivalent to 0.1 ^g/|J.l DNA, and distributed 
over 1.5 ml TPX polymethylpentene tubes (300 ^il per 
tube; Diagenode, Denville, NJ). Suspensions were soni- 
cated for a total of 25 cycles of 30 s ON and 30 s OFF at 
high intensity using a Bioruptor UCD-200 (Diagenode). 
Samples were centrifuged for 10 min at 14,000 rpm at 
4°C to remove insoluble material, and fragmented chro- 
matin was stored at -80°C until further use. 

Chromatin fragmentation by MNase treatment 

Parasite nuclei were resuspended in MNase buffer (50 mM 
Tris-HCl pH 7.4, 4 mM MgCla, 1 mM CaClj, 2 mM 
AEBSF, and EDTA-free protease inhibitor cocktail) and 
transferred to 1.5 ml TPX polymethylpentene tubes. Nuclei 
were permeabilized by mild sonication for 4 cycles of 15 s 
ON and 45 s OFF at high intensity using a Bioruptor UCD- 
200. Next, 50 U MNase (USB corporation, Cleveland, OH) 
was added, followed by incubation for 5 min at 37°C. 
MNase was inactivated by the addition of EDTA to a final 
concentration of 5 mM and incubation for 5 min at RT 
with agitation. The samples were diluted lOx in immuno- 
precipitation buffer (15 mM Tris-HCl pH 8.1, 150 mM 
NaCl, 1.5 mM EDTA, 1% Triton X-100, 0.1% SDS, 
and EDTA-free protease inhibitor cocktail), incubated for 
10 min at 4°C, and centrifuged for 5 min at 14,000 rpm at 
4°C to remove insoluble material. Fragmented chromatin 
was stored at -80°C until further use. 

Chromatin immunopreclpltatlon 

Chromatin fragmented by sonication was first diluted lOx 
in Chip dilution buffer (0.01% SDS, 1% Triton X-100, 
1.2 mM EDTA, 16.7 mM Tris-HCl pH 8.1, 167 mM NaCl, 
and EDTA-free protease inhibitor cocktail). Immunopre- 
cipitations were then performed using the Chromatin Im- 
munoprecipitation Assay Kit (Millipore, Billerica, MA) 
according to the manufacturer's protocol. In brief, samples 
were precleared with Protein A Agarose beads to reduce 
non-specific background and were then incubated O/N at 
4°C with anti-H3 antibody raised against a peptide within 
the region between residue 100 and the C-terminus that 
is identical between H3 and H3.3 (abl791. Abeam, 
Cambridge, UK), an anti-H4 antibody (05-858, Millipore) 
or a non-specific antibody (custom-made antibody di- 
rected against a putative ubiquitin-activating enzyme El 
[PF3D7_1225800]). Immunocomplexes were recovered 
using Protein A Agarose beads, followed by extensive 
washes in low salt immune complex wash buffer, high salt 
immune complex wash buffer, LiCl immune complex 
wash buffer and finally TE buffer. Chromatin was eluted 
from the antibody by two subsequent incubations of 
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15 min at RT in freshly prepared elution buffer (1% SDS, 
0.1 M NaHCOs). To reverse crosslinking, NaCl was added 
to a final concentration of 0.5 M, followed by O/N incuba- 
tion at 45°C. Samples were then treated with RNase A 
(Life Technologies, Carlsbad, CA) for 30 min at 37°C. 
EDTA (final concentration 8 mM), Tris-HCl pH6.5 (final 
concentration 33 mM) and proteinase K (final concentra- 
tion 66 |ig/ml; New England Biolabs, Ipswich, MA) were 
added, followed by incubation for 2 h at 45°C. DNA was 
recovered by phenol:chloroform:isoamylalcohol extraction 
and ethanol precipitation. DNA was further purified using 
Agencourt AMPure XP beads (Beckman Coulter, Brea, 
CA). 

Library preparations and sequencing 

Libraries from sonication ChIP samples were prepared 
using the Encore Multiplexing System (NuGEN, San 
Carlos, CA) according to the manufacturer's instructions, 
with the following modifications for the high AT-content 
of the P. falciparum genome: libraries were amplified for a 
total of 15 PCR cycles (5 cycles of [15s at 98°C, 30s at 
55°C, 30s at 62°C] followed by 10 cycles of [15 s at 98°C, 
30s at 63''C, 30s at 72°C]) using KAPA HiPi HotStart 
Ready Mix (Kapa Biosystems, Woburn, MA). Libraries 
from MNase ChIP samples were prepared using the 
NEBNext ChlP-Seq Library Preparation kit (NEB) accor- 
ding to the manufacturer's instructions, with the following 
modifications for the high AT-content of the P. falcip- 
arum genome: libraries were amplified for a total of 11 
PCR cycles (3 cycles of [15 s at 98°C, 30s at 55''C, 30s at 
62°C] followed by 8 cycles of [15 s at 98°C, 30s at 63°C, 
30s at 72°C]) using KAPA HiPi HotStart Ready Mbc. 
Libraries were sequenced with a HiSeq 2000 (lUumina, 
San Diego, CA), generating 50 bp paired-end sequence 
reads. Sequence reads are available through the Short 
Read Archive under BioProject numbers SRP026365 (nu- 
cleosome positioning data) and SRP026367 (steady-state 
mRNA data gametocytes). Steady-state mRNA sequence 
data from ring, trophozoite and schizont stages are avail- 
able under accession numbers SRS417027, SRS417268, 
and SRS417269, respectively. 

Sequence mapping 

The first 5 bases and the last base were systematically 
removed from the sequence reads using FastQ Trimmer, 
part of the FASTX-Toolkit (http://hannonlab.cshl.edu/ 
fastx toolkit/index.html). Contaminating adaptor reads 
were removed using Scythe (https://github.com/ucdavis- 
bioinformatics/scythe). Reads were then trimmed for 
bases with a quality score below 30, and reads contai- 
ning any Ns as well as reads shorter than 18 bases were 
discarded using Sickle (https://github.com/ucdavis-bio- 
informatics/sickle). The trimmed sequence reads were 
first mapped to the human genome (HG19, downloaded 



from ftp://ftp.1000genomes.ebi.ac.uk/voll/ftp/), and all 
non-mapped reads were subsequently mapped to P. 
falciparum 3D7 genome v9.0 (downloaded from www. 
plasmoDB.org) using BWA [52] with default error rates, 
allowing a maximum of 1500 bp distance between read 
pairs. Any read that was either non-uniquely mapped 
(Samtools vO.1.18 [53]), not properly paired (Samtools) or 
a PCR duplicate (Picard Tools vl.78 [http://picard.source- 
forge.net/]) was discarded. The final number of mapped 
reads for each library is listed in Additional file 1: Table SI. 

Nucleosome positioning and normalization strategy 

Nucleosome positions and score were computed using 
our novel nucleosome positioning software tool PuFFIN 
[27]. PuFFIN is a multi-scale peak-calling method that 
replaces each nucleosome-enriched paired-end sequence 
read with a Gaussian curve. By summing these Gaussian 
distributions for all paired-end reads, PuFFIN creates a 
smoothed coverage function, in which each peak repre- 
sents a candidate nucleosome location. By changing the 
width of the Gaussian curves, PuFFIN can capture can- 
didate locations at different resolution scales. The num- 
ber of reads mapped to a nucleosome (expressed per 10^ 
of total mapped reads) was used as a score for nucleo- 
some strength. 

To compare nucleosome levels between time points, 
nucleosome scores were normalized by first calculating 
the ratio between the number of reads mapped to the 
P. falciparum genome and the total number of reads 
mapped to both the human and the P. falciparum gen- 
ome. The final normalization factor was obtained by div- 
iding the normalization ratio of each stage by the 
normalization ratio of the ring stage. Normalization ratios 
and final normalization factors are shown in Additional 
file 1: Table S2. The rationale behind this normalization 
strategy is that the amount of human DNA contamination 
is assumed to be relatively constant (see below), while the 
amount of nucleosome-bound P. falciparum DNA reco- 
vered after immunoprecipitation is highly influenced by 
nucleosome levels in the parasite. We therefore expect the 
fraction of sequence reads mapped to the P. falciparum 
genome out of the total number of mapped reads to be 
directly proportional to nucleosome occupancy. Various 
factors complicate the comparison of nucleosome levels at 
different stages: (1) the ratio of human versus P. falcip- 
arum DNA changes during the cell cycle as the parasite 
multiplies its genome while the amount of human cells is 
invariant. Accordingly, the fraction of human sequence 
reads at the schizont stage of the asexual cell cycle is lower 
as compared to the ring stage. (2) Gametocyte cultures are 
harvested at relatively low parasitemia after two weeks of 
maturation and are thus expected to have smaller num- 
bers of viable human white blood cells. (3) The level of 
human DNA contamination may differ from experiment 
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to experiment, depending on the amount human white 
blood cells in culture and subsequent experimental meth- 
odologies. As a result of its non-linear relationship and 
inter-experimental variability, the true ratio between hu- 
man and P. falciparum sequence reads is very difficult (if 
not impossible) to model. We therefore assumed this ratio 
to be constant in each sample. Our normalization strategy 
may result in a slight overestimation of nucleosome levels 
at the schizont stage, but the lower-than-expected per- 
centage of. P. falciparum sequence reads at the trophozoite 
stage (Additional file 1: Table S2) in our study, as well as 
in previously published nucleosome occupancy sequen- 
cing data [20], clearly demonstrates reduced nucleosome 
levels at this time point. Normalization using only P. 
falciparum-mapped sequence reads does not allow a 
quantitative assessment of nucleosome levels, but rather 
eliminates all information about its effective library size. 
Of note, our normalization strategy only impacts the com- 
parison of nucleosome levels between time points. The 
normalization factor is applied in a genome-wide fashion, 
and therefore does not influence the results of analyses 
within each sample {i.e. differences in nucleosome land- 
scape between subsets of genes). 

Nucleosome positioning analysis In transcription clusters 

Since transcription start sites have not been annotated 
genome-wide and the majority of annotated transcription 
start sites in the compact P. falciparum genome are 
located within 500 bp of the translation start site, we used 
the 500 base pair region directly upstream of the transla- 
tion start codon to compute nucleosome characteristics of 
promoter regions. To avoid overlap of these promoter re- 
gions with gene regions, only genes that are located more 
than 500 nucleotides away from the neighboring gene 
were included in the analysis (excluding 136 genes). In 
addition, gene families which may be subject to alternative 
mechanisms of transcriptional regulation were excluded 
{var, rifin, stevor, surfm, Pfmc-2tm; n = 331). Of the 
remaining 5036 genes, the 36 genes with lowest expression 
values were also discarded to obtain a group of 5000 genes 
that could easily be divided into equally sized groups. 

Transcription start sites have been experimentally deter- 
mined approximately 50% of P. falciparum genes [54,55]. 
Non-overlapping genes with a known TSS [19] that were 
located at least 500 bp from the neighboring gene were 
used to plot the sequence coverage around the TSS (see 
"Coverage plots"). 

The average number of nucleosomes and the average 
nucleosome scores per transcription cluster were calcu- 
lated by first computing the average value for each gene, 
and then computing the average value for all genes in each 
gene cluster. 

Genes with stage-specific expression profiles were se- 
lected based on transcription clusters reported in a 



previous study [29]. Genes that were identified as being 
expressed at a single stage of the asexual cell cycle (either 
ring, trophozoite, or schizont stage) and that were ex- 
pressed at relatively low levels at the gametocyte stage 
(bottom 70% expression levels of all genes) were included 
in the stage-specific expression groups. Gametocyte-stage 
specific expressed genes were defined as all genes among 
the top 30% expression levels at the gametocyte stage and 
the bottom 50% expression levels in all other stages. 

Analysis of previously published data sets 

P. falciparum 3D7 nucleosome positioning single-end se- 
quence reads [20] were downloaded (SRX026772/3) and 
trimmed using the same strategy as used for sequence 
read libraries generated in this study. Sequence reads were 
then mapped to P. falciparum 3D7 genome version 9.0 
(www.plasmodb.org) using BWA with default error rates. 
Based on Giemsa-stained blood smears provided with the 
publication, the sample obtained 30 hours post-invasion 
was determined to represent the trophozoite stage. For 
the generation of nucleosome maps, single-end reads were 
extended to 110 bp and nucleosome positions were sub- 
sequently determined using the PuFFIN software. The 
score for each nucleosome was normalized as described in 
the section "Nucleosome positioning and normalization 
strategy". 

Saccharomyces cerevisiae nucleosome positioning se- 
quence reads [31] were downloaded (SRA001003) and 
were mapped to the S. cerevisiae BY4741 genome version 
Toronto_2012 (downloaded from http://www.yeastgen- 
ome.org/) using bowtie2 with " — very-fast-local" setup to 
account for possible adapter contamination. An MNase- 
Seq nucleosome positioning data set for the human cell 
line GM12878 was downloaded from the ENCODE data- 
base (DCC Accession Number wgEncodeEH000922) [33]. 
Nucleosome positions and scores were determined 
using PuFFIN. Analysis of nucleosome positioning in 
relation to transcription levels were performed as de- 
scribed above for P. falciparum, using a previously pub- 
lished RNA-Seq expression data set for S. cerevisiae [32] 
and an RNA-Seq expression data set for the human cell 
line GM12878 generated as part of the ENCODE project 
(file wgEncodeCaltechRnaSeqRawData5ReplGml2878Cell- 
LongpolyaErng32x75.rpkm) for human [33]. 

Coverage plots 

Genome browser plots were obtained using Artemis [56]. 
Coverage plots were prepared by extracting the read counts 
for the region of interest for all genes included in the 
analysis, and subsequently calculating the average value for 
each nucleotide position. Coverage profiles were smoothed 
using overlapping 150 bp sliding windows, and were sub- 
sequently plotted using bioconductor R package ggplot2. 
For coverage heatmaps, read counts per nucleotide were 
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divided by the total number of mapped reads (x 10^) and 
subsequently multiplied by the stage-specific norma- 
lization factors (Additional file 1: Table S2). Nucleotide 
positions with zero read coverage were replaced with the 
lowest overall read coverage (i.e. 0.001). Read counts were 
then converted to log 10 values, and were finally plotted 
in R using the pheatmap package. 

Analysis of nucleosome binding preferences 

For both intergenic and coding regions, enrichment of 5- 
mers was calculated as the log base 2 ratio of the frequency 
of 5-mers within 147 bp nucleosome-bound DNA frag- 
ments (based on intergenic and genie nucleosome positions 
reported by PuFFIN +/- 73 bp) and the 5-mer frequency in 
all intergenic regions and coding regions, respectively. To 
identify differences in sequence composition between genes 
with different expression levels, all 1024 possible 5-mers 
were divided into five groups based on their genome-wide 
frequency in intergenic or coding regions. Subsequentiy, 
the average 5-mer enrichment in promoter and coding re- 
gions was calculated for each transcription cluster of 500 
genes and was z-normalized across the five frequency 
groups. 

Availability of supporting data 

The data sets supporting the results of this article are avail- 
able in the Short Read Archive, [SRP026365, SRP026367, 
SRS417027, SRS417268, SRS417269; http://www.ncbi.nlm. 
nih.gov/sra] . 
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